Bayesian Q-learning with Assumed Density Filtering
نویسندگان
چکیده
While off-policy temporal difference methods have been broadly used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have been relatively understudied. This is mainly because the max operator in the Bellman optimality equation brings non-linearity and inconsistent distributions over value function. In this paper, we introduce a new Bayesian approach to off-policy TD methods using Assumed Density Filtering, called ADFQ, which updates beliefs on action-values (Q) through an online Bayesian inference method. Uncertainty measures in the beliefs not only are used in exploration but they provide a natural regularization in the belief updates. We also present a connection between ADFQ and Q-learning. Our empirical results show the proposed ADFQ algorithms outperform comparing algorithms in several task domains. Moreover, our algorithms improve general drawbacks in BRL such as computational complexity, usage of uncertainty, and nonlinearity.
منابع مشابه
Assumed Density Filtering Methods for Learning Bayesian Neural Networks
Buoyed by the success of deep multilayer neural networks, there is renewed interest in scalable learning of Bayesian neural networks. Here, we study algorithms that utilize recent advances in Bayesian inference to efficiently learn distributions over network weights. In particular, we focus on recently proposed assumed density filtering based methods for learning Bayesian neural networks – Expe...
متن کاملBelief Propagation and Locally Bayesian Learning
Highlighting, a conditioning effect, consists of both primacylike and recency-like effects in human subjects. This combination of effects are notoriously difficult for Bayesian models to produce. An approximation to probabilistic inference, Locally Bayesian learning (LBL), can predict highlighting by partitioning the model into regions during learning and passing messages between these regions....
متن کاملPBODL : Parallel Bayesian Online Deep Learning for Click-Through Rate Prediction in Tencent Advertising System
We describe a parallel bayesian online deep learning framework (PBODL) for clickthrough rate (CTR) prediction within today’s Tencent advertising system, which provides quick and accurate learning of user preferences. We first explain the framework with a deep probit regression model, which is trained with probabilistic back-propagation in the mode of assumed Gaussian density filtering. Then we ...
متن کاملEstimating Continuous Distributions in Bayesian Classi ers
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive...
متن کاملA Neural Network Framework for Implementing the Bayesian Learning
The research reported in the paper aims the development of a suitable neural architecture for implementing the Bayesian procedure for solving pattern recognition problems. The proposed neural system is based on an inhibitive competition installed among the hidden neurons of the computation layer. The local memories of the hidden neurons are computed adaptively according to an estimation model o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.03333 شماره
صفحات -
تاریخ انتشار 2017